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Abstract 

Parameterless stopping criteria for recursive polynomial expansions to construct 
the density matrix in electronic structure calculations are proposed. Based on conver¬ 
gence order estimation the new stopping criteria automatically and accurately detect 
when the calculation is dominated by numerical errors and continued iteration does 
not improve the result. Difficulties in selecting a stopping tolerance and appropriately 
balancing it in relation to parameters controlling the numerical accuracy are avoided. 
Thus, our parameterless stopping criteria stand in contrast to the standard approach to 
stop as soon as some error measure goes below a user-defined parameter or tolerance. 
We demonstrate that the stopping criteria work well both in dense and sparse ma¬ 
trix calculations and in large-scale self-consistent field calculations with the quantum 
chemistry program Ergo (www.ergoscf.org). 


1 Introduction 

An important computational task in electronic structure calculations based on for example 
Hartree-Fock- or Kohn-Sham density functional theory—^ is the computation of the one- 
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electron density matrix D for a given Fock or Kohn-Sham matrix F. The density matrix is 
the matrix for orthogonal projection onto the subspace spanned by eigenvectors of F that 
correspond to occupied electron orbitals: 


Fxi = A iX h 

(1.1) 

'Focc 

D ■■= J2 X i X i’ 

2 — 1 

(1.2) 


where the eigenvalues of F are arranged in ascending order 


(1.3) 


n occ is the number of occupied orbitals, Ahomo is the highest occupied molecular orbital 
(homo) eigenvalue, and Ai umo is the lowest unoccupied molecular orbital (lumo) eigenvalue 
and where we assume that there is a gap 


£ '— Al umo Ahomo 0 (1-4) 

between eigenvalues corresponding to occupied and unoccupied orbitals. An essentially direct 
method to compute D is to compute an eigendecomposition of F and assemble D according to 
(ll.2l) . Unfortunately, the computational cost of this approach increases cubically with system 
size which limits applications to rather small systems. Alternative methods have therefore 
been developed with the aim to reduce the computational complexity.- One approach is to 
view the problem as a matrix function 

D — 0(fil — F), (1.5) 

where 9 is the Heaviside function and /i is located between Ahomo and Ai umo , which makes 
(11.511 equivalent to the definition in (I1.2H . 5 A condition number for the problem of evaluating 
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(I1.5p is given by 


k '■= lim sup 

h ~>° A:||A||=Ae 


H0(mI - (F + hA)) - 0(Ml - F)H 
h 


Ae 

T 


( 1 . 6 ) 


where Ae is the spectral width of F.—~— We let ||A|| = Ae to make the condition number 
invariant both to scaling and shift of the eigenspectrum of F. 1 

When the homo-lumo gap £ > 0, a function that varies smoothly between 0 and 1 in the 
gap can be used in place of (11.51) . To construct such a function, recursive polynomial ex¬ 
pansions or density matrix purification have proven to be particularly simple and efficient 
The regularized step function is built up by the recursive application of low-order polyno- 


Algorithm 1 Recursive polynomial expansion (general form) 

1: Xp = fo(F) 

2: Xq = Xq + Eq 

3: while stopping criterion not fulfilled, for i — 1,2,... do 
4 : Xi = MXi^) 

5 : Xi = Xi + Ej 

6: end while 


rnials /o, fi ,..., see Algorithm [TJ With this approach, a linear scaling computational cost is 
achieved provided that the matrices in the recursive expansion are sufficiently sparse, which 
is usually ensured by removing small matrix elements during the course of the recursive 
expansion.— In Algorithm Q] the removal of matrix elements, also called truncation, is writ¬ 
ten as an explicit perturbation E) added to the matrix in each iteration. Several recursive 
expansion algorithms fitting into the general form of Algorithm [T] have been proposed. Note 
that here we are considering methods that operate in orthogonal basis. The function /o is 
usually a first order polynomial that moves all eigenvalues into the [0, 1] interval in reverse 
order. A natural choice for the iteration function /*, i = 1,2,... is the McWeeny polynomial 
3a: 2 — 2a; 3 , 11,12 which makes Algorithm [1] essentially equivalent to the Newton-Schulz itera¬ 
tion for sign matrix evaluation.- Furthermore, algorithms were developed that do not require 
beforehand knowledge of fi. Palser and Manolopoulos proposed a recursive expansion based 
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on the McWeeny polynomial.— Niklasson proposed a simple and efficient algorithm based 
on the second order polynomials x 2 and 2x — x 2 .— We will refer to this algorithm as the 
SP2 algorithm. The recursive application of polynomials gives a rapid increase of the poly¬ 
nomial order and the computational cost increases only with the logarithm of the condition 
number.- 1 ^ The computational cost can be further reduced by a scale-and-fold acceleration 
technique giving an even weaker dependence on the condition number.— Recursive expan¬ 
sion algorithms are key components in a number of linear scaling electronic structure codes 
including CP2K,— Ergo,— ^ FreeON,— Honpas,— and LATTE.— Since most of the 
computational work lies in matrix-matrix multiplications, recursive expansion algorithms are 
well suited for parallel implementations^! - — and a competitive alternative to diagonalization 
also in the dense matrix case.— ^ 

Different ways to decide when to stop the iterations have been suggested. A common 
approach is to stop when some quantity, measuring how far the matrix is from idempotency, 
goes below a predetermined convergence threshold value. The deviation from idempotency 
has been measured by the trace^!^ or some matrix norm-^^^ - — of X t — X 2 sometimes 
scaled by for example the matrix dimension. However, since the recursive expansion is at 
least quadratically convergent, what one usually wants is to continue iterating until the 
idempotency error does not anymore decrease substantially. This happens when any further 
substantial decrease is prevented by rounding errors or errors due to removal of matrix 
elements. 

To find a proper relation between matrix element removal and the parameter measuring 
idempotency can be a delicate task, often left to the user of the routine. However, a few 
attempts to automatically detect when numerical errors start to dominate exist in the lit¬ 
erature. Palser and Manolopoulos noted that with their expansions, the functional Tr[XiF] 
decreases monotonically in exact arithmetics and suggested to stop on its first increase which 
should be an indication of stagnation.— A similar criterion for the SP2 expansion was pro¬ 
posed by Cawkwell et al— In this case, the iterations are stopped on an increase of the 
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idempotency error measure |Tr[X, : — X?]\. However, the value of the functional or the idem- 
potency error may continue to decrease without significant improvement of the accuracy. In 
such cases, the computational effort in last iterations is no longer justified. In the present 
work, we propose new parameterless stopping criteria based on convergence order estima¬ 
tion. The stopping criteria are general and can be used both in the dense and sparse matrix 
cases using different strategies for truncation, and with different choices of polynomials. 

2 Parameterless stopping criteria 

The iterations of density matrix expansions can be divided into three phases:— 1 ^ 1) the con¬ 
ditioning phase where the deviation from idempotency decreases less than quadratically or 
not at all, 2) the purification phase where the idempotency error decreases at least quadrat¬ 
ically, and 3) the stagnation phase where the idempotency error again does not decrease 
significantly or at all, see Figure [7TT1 

Here, we propose new parameterless stopping criteria designed to automatically and 
accurately detect the transition between purification and stagnation, so that the procedure 
can be stopped without superfluous iterations in the stagnation phase. We measure the 
deviation from idempotency by 



( 2 . 1 ) 


We recall that an iterative method has asymptotic order of convergence q if it in exact 
arithmetics generates a sequence of errors e 1 ,e 2 , ■ ■ ■ such that 


lim -p— = C' 

i —^ rv—i O 


( 2 . 2 ) 


where C°° is an asymptotic constant. The order of convergence can also be observed numer¬ 
ically by 



(2.3) 
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conditioning purification stagnation 


Figure 2.1: Illustration of the three phases for a recursive expansion of order q — 2 based on 
the polynomials x 2 and 2x — x 2 (see Section HD • In the conditioning phase the matrix does 
not come closer to idempotency but the condition number, k, is lowered. In the purification 
phase the condition number is close to 1 and idempotency is approached quadratically. In 
the stagnation phase numerical errors start to dominate and the matrix again does not 
come closer to idempotency. The upper panel shows what we call the observed orders of 
convergence g* and r \. Throughout the conditioning and purification phases rg > 2 but in 
the stagnation phase r t < 2. 
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Our stopping criteria are based on the detection of a discrepancy between the asymptotic 
and observed orders of convergence. When the stagnation phase is entered numerical errors 
start to dominate, leading to a fall in the observed order of convergence, see the upper panel 
in Figure EU 

Since the observed order can be significantly smaller than the asymptotic order also in 
the initial conditioning phase, an issue is how to determine when the purification phase has 
started and one can start to look for a drop in the order gp A similar problem of determining 
the iteration when to start to check the stopping criterion appears in the method described 
by Cawkwell et al— Our solution is to replace the asymptotic constant C°° in (j2.3j) with a 
larger value such that the observed order of convergence in exact arithmetics is always larger 
than or equal to the asymptotic order of convergence. In other words we want to find C q , as 
small as possible, such that in exact arithmetics 


A 


lOg (Cj/Cg) > ^ 
log(et-i) 


(2.4) 


for all i. One may let C q vary over the iterations but we will later see that it is usually 
sufficient to use a single value C q for the whole expansion. Important is that, in exact 
arithmetics, rg > q for all i. In the presence of numerical errors rg is significantly smaller than 
q only in the stagnation phase. We may therefore start to look for a drop in r* immediately. 
As soon as the observed order of convergence, ip goes significantly below q, the procedure 
should be stopped, since this indicates the transition between purification and stagnation. In 
this way we avoid the issue of detecting the transition between conditioning and purification. 
See the upper panel in Figure 12711 for an illustration. 

For clarity we note that (12.4j) is equivalent to 


C q > 


\\fi{Xi-i) - i) 2 I| 2 


p q 

e i-l 




X? 


i—1 


(2.5) 


For generality and simplicity we want to assume as little information as possible about 
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the eigenspectra of X i} i = 0,1,.... We will use the following theorem to find the smallest 
possible C q value fulfilling (j2.5j) with no or few assumptions about the location of eigenvalues 
for several recursive expansion polynomials of interest for appropriate choices of q. 

Theorem 1. Let f be a continuous function from [0, 1] to [0, 1] and assume that the limits 


lim 

>-a+ 


f(x) - fix ) 2 

(x — X 2 )l 


and 


lim 

x—y (1—cl) 


f(x) ~ f(x) 2 

(.X — X 2 Y 


( 2 . 6 ) 


exist for some q > 0, where a E [0, 0.5]. Let H denote the set of Hermitian matrices with 
all eigenvalues in [0, 1] and at least one eigenvalue in [a, 1 — a]. Then, 


max 

X&H 


\\f(X) - f{X)% 
||X-A^ 


max q(x,a ), 
xe[o,i] 


(2.7) 


where 

g(x, a) 

is extended by continuity at x = 


J W J 


{x-x 2 Y L J a - X - 
otherwise, 

0 and x = 1 for a = 0. 


( 2 . 8 ) 


As suggested by Theorem dj we will choose C q := max^^p,!] g(x, a) thereby making sure 
that (12. 5 j) is fulhlled. In principle, the value a should be chosen as large as possible to get 
the smallest possible C q - value, since a larger a gives a smaller set of matrices H in (I2.7|h We 
note that it is possible to let C q vary by choosing the largest possible a in every iteration, but 
in this work we will attempt to use a single value for the whole expansion whenever possible. 
There is always at least one eigenvalue in the interval [0, 1] in every iteration. Therefore, if, 
for the given recursive expansion polynomials /j, the limits (12.61) exist with a = 0 we employ 
the theorem with a = 0. Only if the limits (12.6j) do not exist with a = 0 will we use the 
theorem with a > 0 and in general get different values of C q in every iteration. In such a 
case some information about the eigenspectrum of X t in each iteration i is needed so that 
a can be chosen appropriately. The theorem should be invoked with q equal to the order of 
convergence of the recursive expansion. 














Proof of Theorem\T\ Continuity of f(x) together with existence of the two limits (12.61) (needed 
in case a = 0) implies existence of max^^i] g(x, a). Let X be a matrix in H. Then, 


\\f(X)-f(X)% 

WX-XT 2 


^ f{x) ~ f(xf 

< max max — -—— 

xe[0,i] ye[a,i-a] (y - y 2 ) q 

|y—0.5|<|z—0.5| 


= max max 


max 


f{x) - fjx) 2 

x£[a,l-a] |j/-0.5|<p—0.5| {%) — I/ 2 ) 9 

fix) - f(x) 2 \ 

max max —7 -—- 

a:6[0,l]\[o,l-o] j/G[o,l-a] [y — y 2 ) q J 

( f{x) - f(x) 2 

= max max — -——, 

ya:G[a,l-a] [X — X Z ) q 


fix) - f(xf 
x€[0,l]\[a,l-a] (a — a 2 ) q 


max 


= max q(x,a). 
xe[o,ir v ’ 


(2.9) 

( 2 . 10 ) 


( 2 . 11 ) 


( 2 . 12 ) 


The maximum in (12.71) is attained for any matrix in H with an eigenvalue A = arg max a . e r 0il i g{x, a) 


□ 


Remark 2.1. Inequality (12.91) can be interpreted as follows. The arguments of the maxima, 
x andy, play the roles of the eigenvalues of X that define ||/(X) —/(X ) 2 || 2 and ||X —X 2 || 2 , 
respectively. While, for all we know, x can be any eigenvalue, y is necessarily the eigenvalue 
of X that is closest to 0.5. In other words, 


y = arg min |A — 0.5 

AE{Aj} 


(2.13) 


and 

\\X ~ X 2 \\2 = max A - A 2 = y - y 2 , (2.14) 

AE\A^ j- 

where {A,} are the eigenvalues of X. Since there is at least one eigenvalue at x and at least 
one eigenvalue in [a, 1 — a] the constraints on y in ( 12 .9p follow. 

By construction, the observed convergence order r* may in exact arithmetics end up 
exactly equal to q. We want to detect when numerical errors start to dominate and drops 


9 












significantly below q but at the same time we want to disregard small perturbations that 
cause r\ to go only slightly below q. In practice we will therefore use a parameter q < q 
in place of q. In other words, our stopping criteria will be on the form stop as soon as 
log (d/Cq)/ log(ej_i) < q. In this work we will mainly consider second order expansions with 
q = 2 and will then use q = 1.8. 

The spectral norm is often expensive to compute and one may therefore want to use an 
estimate in its place. One cheap alternative is the Frobenius norm 


Vi 


\\Xi 


X?\\F = 




(2.15) 


where { Xj } are the eigenvalues of X t . We expect the Frobenius norm to be a good estimate 
to the spectral norm when we are close to convergence since then many eigenvalues are 
clustered around 0 and 1 and do not contribute significantly to the sum in (12.151) . Since 
||Xi — X 2 || 2 < ||Xj — Xf\\ F we have that 


Vi = Kiei 


(2.16) 


for some K t > l. Assume that for a given q 


K, < K_v 


(2.17) 


Then, 


= K,e, < = C.vl i, 


(2.18) 


and, assuming that Uj_i < 1, 


log (Vi/Cg 


> q, 


(2.19) 


log(nj_i) 

which means that if the assumptions above hold we will not stop prematurely. In the follow¬ 
ing sections we will see under what conditions (12.1711 holds for specific choices of polynomials 
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for the recursive expansion. 

For very large systems, it may happen that ry never goes below 1 because of the large 
number of eigenvalues that contribute to the sum in (12.151) . in such a case the stopping 
criterion in the suggested above form will not be checked. One may then make use of the 
so-called mixed norm.— Let the matrix be divided into square submatrices of equal size, 
padding the matrix with zeros if needed. One can define the mixed norm as the spectral 
norm of a matrix with elements equal to the Frobenius norms of the obtained submatrices. 
It can be shown that 


ei<rrii< v t (2.20) 

where m; := \\X t — A 7 " 2 ^ is the mixed norm of X t — Xf in iteration i. The result for the 
Frobenius norm above that we will not stop prematurely therefore holds also for the mixed 
norm under the corresponding assumptions, i.e. (12.161) and (12.171) with Vi replaced by m, and 
rrii -1 < 1. However, with a fixed submatrix block size the asymptotic behavior of the mixed 
norm follows that of the spectral norm. Thus, the quality of the stopping criterion will not 
deteriorate with increasing system size. One may therefore consider using the mixed norm 
for large systems. 

We will in the following mainly focus on the regular and accelerated McWeeny and second 
order spectral projection polynomials. 


3 McWeeny polynomial 


We will first consider a recursive polynomial expansion based on the McWeeny polynomial 
Pmw(x) '■= 3x 2 — 2x 3 ,— ^ i.e. Algorithm Q] with 


fo(x) = 


fl — X 


2 max(A max /i, n A m j n ) 
fi{x) = Pmw(x), i — 1,2, ... . 


0.5, 


(3.1) 
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Here, A min and A max are the extremal eigenvalues of F or bounds thereof, i.e. 


Amin h; A i and A max -A A n- 


(3.2) 


We want to find the smallest value C™ w such that 


logfe/cro 

log(ei-i) 


(3.3) 


in exact arithmetics. Note that 


hm --775-= lim -7^-7--= 3 . 

a:—>0+ (£ — X z ) z i->l- (X — X z ) z 


(3.4) 


Thus, we may invoke Theorem [0 with / = p mw , a = 0 and q = 2 which gives 


CF W = max 


Pmw(s) ~ Pmw{xf 


xG[0,l] (x — X 2 ) 2 


= max (3 + 4x — 4a; 2 ) = 4 
xe[o,i] 


(3.5) 


and we suggest to stop the expansion as soon as 


log(e i /4) 

l°g( e i-l) 


< 14 


(3.6) 


We note that the McWeeny polynomial can be defined as the polynomial with fixed 
points at 0 and 1 and one vanishing derivative at 0 and 1. For polynomials with fixed points 
at 0 and 1 and q — 1 vanishing derivatives at 0 and 1, sometimes referred to as the Holas 
polynomials,— it can be shown that the smallest value C £ such that 


log (ei/Cq) 

l°g( e «-i) ~ Q 


(3.7) 


is C J = 4*- 1 . 
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3.1 Acceleration 


Prior knowledge of the homo and Inrno eigenvalues makes it possible to use a scale-and- 
fold technique to accelerate convergence.— In each iteration the eigenspectrum is stretched 
out so that the projection polynomial folds the eigenspectrum over itself. This technique 
is quite general and may be applied to a number of recursive expansions making use of 
various polynomials. In case of the McWeeny polynomial the eigenspectrum is stretched out 
around 0.5, see Algorithm [21 The amount of stretching is determined by the parameter £ 
which is an estimate of the homo-lumo gap £ such that at least one eigenvalue of F is in 
[/i — £/2, n + £/2], With a = 1 on line El Algorithm El reduces to the regular McWeeny 
expansion discussed above. 


Algorithm 2 McW-ACC algorithm 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 


input. Fj A min , A max , /r, £ 

7 = 2 max(A m a X -//,//- A min ) 

Po = 0.5(1 - £/y) 

Xo = ^ + 0.5 1 

A"o = Ap + Eo 
e o — ||A 0 — Aq|| 2 

for %— 1 , 2 ,... do 

a = 3/(12/9?-i - 18A-i + 9) 1/2 
X s = a(Xi_i — 0.5 1) + 0.5 1 
X, = 3 A; - 2 A; 

Ps = a{Pi -1 - 0.5) + 0.5 
A = 3/3 2 -2/3, 3 
Aj = Aj T if 1 ,; 
e, = ||A)-A 2 || 2 
if < i.8 then 

log(ei-i) 

n = i 

break 
end if 
end for 
output: n, X n 


As for the regular McWeeny expansion we want to find the smallest value C'™ wa such that 


log(e i /C' 2 mwa ) 


> 2. 


(3.8) 









For the sake of analysis we introduce 


Pmwa(x, /3) := p mw (3(12/5 2 - 18/3 + 9)-V 2 (x - 0.5) + 0.5 

) • (3.9) 

Note that line fTOl in Algorithm [2] is equivalent to X % = p mwa (Xj_i,/3j_i). 

Furthermore, let 

\gi{x,/3) if X e [/3, 1 - /3], 

9(x,(3) = l 

(3.10) 

\g 2 (x,/3) otherwise, 


where 


/ n\ Pmwa(*^5 ft) Pmwa(*^5 ft) 

Sl(l ’ /3)= (x-xT 

(3.11) 

/ n\ Pmwa(^5 ft) Pmwa(^? ft) 

92(I ’«= (/? — /3 2 ) 2 

(3.12) 


which are plotted in Figure EID for (3 = 0.3. 




x x 

Figure 3.1: The left panel shows the regular p mw (x ) and accelerated p mwa (x,/3 ) McWeeny 
polynomials with (3 = 0.3. The right panel shows the functions gi(x,(3) and g 2 (x,/3) with 
/3 = 0.3. Black vertical dashed lines indicate (3 and 1 — (3. 


Note that when (3 > 0, the limits lim, i ._ i >o+ gi(x, (3) and lim,^!- gi(x, (3) do not exist. 
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However, we may use Theorem Q] with a > 0. We have that 


limg 1 (x,/3)= lim g x (x,P) 

x-»/3+ 


16(/3 - 1.5) 2 (/3 — 0.75) 2 
{P - l) 2 (4 f3 2 - 6/3 + 3) 3 


3 


if P — 0. 


(3.13) 


Note that 


lim lim gi(x,P)^ lim gi(x, 0) (3-14) 

,3—>0+ x—x— >-0+ 

due to discontinuity of gi(x,fi) at x = 0 for /3 > 0. Since there is at least one eigenvalue in 
[yu, — ^/2, /i + £/2], at least one eigenvalue of Xi will be in the interval [Pi, 1 — Pi] in each 
iteration i. Therefore, we may for each iteration i invoke Theorem Q] with f{x) = p mw a (x, Pi), 
a = Pi and q — 2. We have that 

max di, ft) = max max qi(x,Pi), max o 2 (x, A 
ze[o,i] v V ^ 

— S'! (0.5, Pi) = 4, (3.16) 


(3.15) 


where we used that for P > 0 the function g 2 (x,P) is convex on the intervals [0, /?] and 
[1 — P, 1], and 


.max g 2 (x,P) = g 2 (0,P) = g 2 (l,P) = g 2 (P,P) = ~ P,P) (3.17) 

®e[o,i]\[/9,i-/3] 

= gi(P,P)< max gi(x, P). (3.18) 

xe\p,i-p\ 

Therefore, C'™ wa = 4, which gives the stopping criterion in Algorithm [21 

3.2 Estimation of the order of convergence 

As discussed earlier one may want to use the Frobenius norm in place of the spectral norm 
to measure the idempotency error. It is then desired that (12.171) is fulfilled, at least in 
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iterations prior to the stagnation phase. Let X be a Hermitian matrix with eigenvalues 
{A j : 0 < Xj < 1} and let 


y = arg min |A — 0.5| > 0, (3.19) 

Ae{Aj} 

/ S = min(j/, 1-y). (3.20) 

Then, 

||X-X 2 || 2 = P-P 2 and (3.21) 

||p mw (X) - p mw (X) 2 || 2 = p mw (/3) - p mw {/3) 2 . (3.22) 


Consider the function 


X ~ X 2 _ Prg W (x) - p mw {x) 2 

P ~ /3 2 p mw (P) - Pm W (P) 2 

x (x — 1) (4 P A — 8 /3 3 + P 2 + 3 (3 — 4 x A + 8 x 3 — x 2 — 3 x) 
P 2 (P-lf (—4 /3 2 + 4/3 + 3) 


(3.23) 

(3.24) 


The roots of this function on the interval [0, 1] are 0, 1, P, and 1 — P, and the function is 
non-negative on the intervals [0, P] and [1 — P, 1]. Therefore 


X X Pmw(^) Pmw('J') ^ q 

P~ P 2 ~ Pmw{P) ~ Pmw(P ) 2 ~ 


(3.25) 


for x — x 2 < P — P 2 and 


\\x ^ XYf \\v^{X) - p mw {Xf\\l 

\\x - x% \\ Pmw (x) - Pmw (xni 

E j ((Pmw(Aj) - Pmw(^j) 2 )/(PmwiP) ~ Pmw(P) 2 )f ~ 
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K 2 

Thus, we have that -^r- > 1 for every i > 1. Since K^\ > 1 it follows that 

i 

Kf _! > AVi > A'j, (3.27) 

i.e. (12.1711 is fulhllcd when using the regular McWeeny expansion. 

However, in case of the accelerated McWeeny expansion inequality (12.1711 is not always 
satisfied. Consider for example the diagonal matrix 

X = diag(0, 0,0, /3,1-/3). (3.28) 


Then, 


Pmwa(0) /3) Pmwa(0, /?) Pmwa(/d; Pmwa(/d; ft) 


(3.29) 


and 


|lX-X 2 |l^|lp mwa (X)-p mwa (X) 2 H 2 

ll^-^ 2 llilb m wa(x)-p mwa (w) 2 || F J 

2 (/3 - /3 2 ) 2 (p mwa (/3, /3) - Pmwa(/3, /3) 2 ) 

(P - P 2 ) 2 V 5 (P-Wfl) - Pmw.(/),/3) 2 ) 2 

= -^= « 0.8944. (3.31) 

v 5 

K? 

Thus, for the accelerated McWeeny expansion it may happen that —< 1 for some i. 

To summarize, for the regular McWeeny scheme we will not stop prematurely when the 
Frobenius norm is used in place of the spectral norm. In the accelerated scheme we suggest 
to turn off the acceleration at the start of the purification phase when its effect anyway is 
small. After that the regular scheme is used and the Frobenius norm may be used. We 
will discuss how to detect the transition between the conditioning and purification phases in 
Section [fl 
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4 Second order spectral projection (SP2) expansion 


In the remainder of this article we will focus on recursive polynomial expansions based 
on the polynomials (x) = x 2 and p^(x) = 2x — x 2 proposed by Mazziotti.— In an 
algorithm proposed by Niklasson the polynomials used in each iteration are chosen based on 
the trace of the matrix.— This algorithm, hereinafter the original SP2 algorithm, is given by 
Algorithm [21 including the new stopping criterion proposed here. Knowledge about the homo 


Algorithm 3 The original SP2 algorithm with new stopping criterion 

1 

input. F ? A min , A max 


2 

V _ Vax^ F 

A \ . 

/'max /'min 


3 

X 0 = X 0 + Eq 


4 

e 0 = ||Xo-X 0 2 || 2 


5 

Cf = X (71 + Yly/V?) 


6 

for i — 1 ,2 ,... do 


7 

if Tr[A7_i] > n occ then 


8 

Xi = X?_ ;L 


9 

Pi = 1 


10 

else 


11 

Xj = 2Xi_i - Xl, 


12 

Pi = 0 


13 

end if 


14 

Xj = Xj + Ei 


15 

e i =\\X i -X?\\ 2 


16 

if i > 2 and pi ^ pi_\ and 

l( f j/ c F < 1.8 then 

log(ei_ 2 ) 

17 

n = i 


18 

break 


19 

end if 


20 

end for 


21 

output: n, X n 



and lumo eigenvalues also makes it possible to work with a predefined sequence of polynomials 
and employ a scale-and-fold acceleration technique- 1 -! as described in Algorithms ED and 0 
respectively. In Algorithm Efl A£"mo and AjE, mo are bounds of the homo eigenvalue such that 


\out x <■ \ in 

/v homo — '"'homo / 'homoi 


(4.1) 
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Aj£ mo and A°^ in are bounds of the lumo eigenvalue such that 


Aj" „ < A 


lumo — ^lumo — ^lumo’ 


^ \ out 

A, 


(4.2) 


and Em denotes the machine epsilon. 

Using the scale-and-fold technique the eigenspectrum is stretched out outside the [0, 1] 
interval and folded back by applying the SP2 polynomials. The unoccupied part of the 
eigenspectrum is partially stretched out below 0 and folded back into [0, 1] using the poly¬ 
nomial ((1 — a)I + ax) 2 , where a > 1 determines the amount of stretching or acceleration. 
Similarly, the occupied part of the eigenspectrum is stretched out above 1 and folded back 
using 2 ax — (ax) 2 . 

In the following we will make use of Theorem |T] to derive the stopping criteria employed 
in Algorithms [3] and [5j We first note that neither of the limits 


lim 

X— >1“ 



and 


lim 

x— >-0+ 




x 




x 


(x — X 


212 


(4.3) 


exist. 

The limits required by Theorem |Tj do exist for compositions of alternating polynomials 

Pi p 2) (^) =PS(pip( x )) and = p£ ) (pil ) ( x )) : 


lim 

X-40+ 


pif ] (x) -pf}Hx ) 2 


'sp 


( 12 ) f x) — pg 2 gx ) 2 


(x — X 2 ) 2 


/A)p ' [x) ~ P sp 
xiT (x-x 2 ) 2 


lim 

X^l~ 


pg 1 } (x) -pg 1 } (x ) 2 


\x — X 


sp 

212 


P, 


lim - 

Z-1-0+ (x 


(12) fx) — p^ 2 \x) 2 


X 


2\2 


2 , 

4. 


(4.4) 

(4.5) 


However, the limits 


Pip 2) ( x ) ~Pip 2) (x 


hm -— 

S-10+ (x — x z y 


and 


lim 

x— >i _ 


pg 1 } (x) -pgb(x) 
(x — X 2 ) 2 


(4.6) 
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Algorithm 4 Determination of polynomials for SP2-ACC homo/lumo-based expansion 

1 

| nnn +. \ \ \out \ in \ in tout 

iiijjLiL. A mln , /\ max , /'homo) / 'homo! ^Inmo? ^lumo 

2 

\ \ ln 

Q _ 1 Amax nomo 

Mup X X _ \ . 


3 

\ \ out 

Ac = 1 T“ X 

Amax A m i n 


4 

\ \in 

_ Amax A lumo 

/up \ \ 

1 /'max / >min 



\ \out 


5 



6 

5 = 0.01 


7 

i = 0 


8 

while (/3 up - > e M or y up - 

- 7u P > £ m ) and p t = p t _i do 

9 

i = i + 1 


10 

if Ao < <5 and 7 i 0 < 5 then 

11 

Ao o, tio o 


12 

^min — i + 1 


13 

(5 = 0 


14 

end if 


15 

if 7u P > P up then 


16 

Pi = 1 


17 

^5 

1 

CN 

CN 

8* 


18 

7io = ((! - «*) + «*7io) 2 , 

Tup ((1 Oli) + C^zTup) 

19 

Ao 2a* Ao (c^iAo) i 

up 2a*Aip (^z/3 up) 

20 

else 


21 

Pi = 0 


22 

<*i = 2/(2 - Ao) 


23 

0 

to 

Q 

A 

0 

1 

IT 

A 

0 

bO 

Tup 2o^Tup (AzTup) 

24 

Ao ((1 ®-i) T" QaAo) 

/?up ((1 &i) T~ ^zAip) 

25 

end if 


26 

end while 


27 

Umax = i 


28 

Olltpilt. 72 m j n , Timaxj Pi ? ^z? ^ 

1 ,2 ,..., W max 
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Algorithm 5 SP2-ACC algorithm 


input: F, A min , A 


max; Urn in) 


Tir, 


c , Pi, CKj, 2 = 1,2, 




An = 


*I-F 


■^max An 


A 0 — Ag + Ep 
e o — ||A 0 — Aq|| 2 
Cf = ^ (71 + 17\/l7) 
for i = 1 , 2 ,... ,n max do 
if pi = 1 then 

A , = ((!-«,)/ + a , A ,.!) 2 


9 

10 

11 

12 

13 

14 

15 

16 

17 

18 
19 


else 

Aj 2a i X i _ 1 (ttjAj_i) 

end if 

A i = Aj A Ei 

e i — \\Xi — A 2 || 2 

if i > n min and p ?; 7 ^ p*_i and 1 < 1.8 then 

n = i 

break 
end if 
end for 
output: n, X n 


do not exist. We therefore want to find the smallest value C 2 P such that 


log {e-i/Cf) 
log(ej_ 2 ) 


(4.7) 


provided that pi 7 ^ pj_i. Consequently, we invoke Theorem Q] with / = p^ and / = p( 21 \ 
a = 0, and q — 2. Noting that 


max 

rre[0,l] 


pg 2) (x) 


pg 2 ) (x ) 2 


(x — X 2 ) 2 


max 

ze[o,i] 


P, 


( 21 ) ( 
sp V 


X 


V, 


( 21 ) ( 
sp V 


X 


(x — X 2 ) 2 


32 


71 + 17717 


(4.8) 

(4.9) 


we get 


Cf = i- (71 + 17Vl7) « 4.40915 


(4.10) 
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which leads to the stopping criterion of Algorithm [3J 

Theorem [T] can also be used to derive a stopping criterion for the accelerated algorithm. 
However, in order for the limits (12.6ft to exist the parameter a should be chosen larger 
than 0 and vary throughout the iterations. Since the acceleration is effective only in the 
conditioning phase, we have found it easier to turn off the acceleration when entering the 
purification phase and then use the stopping criterion for the regular expansion. We turn off 
the acceleration as soon as the relevant homo and lumo eigenvalue bounds are close enough 
to 1 and 0 respectively, and start to check the stopping criterion in the next iteration n m in . 
see lines ITUl ITT of Algorithm SI 



- ^ - U m ax 

—*— n, t = 10~ 9 
—e-n, t = 10~ 3 
“ V ” ^min 


Figure 4.1: The estimated number of iterations n max , iteration n min where the acceleration 
has been turned off, and total number of iterations n in the SP2-ACC expansion for various 
values of 5. The recursive expansion is applied to the matrix X 0 = diag(0.48, 0.52) perturbed 
in each iteration by a diagonal matrix with random elements from a normal distribution and 
with spectral norm r. 


There is no need for accurate detection of the transition between the conditioning and 
purification phases—the parameter 5 in the algorithm is to some extent arbitrary. A larger 
(5-value results in less acceleration. A smaller value means that we will start to check the 
stopping criterion later possibly resulting in superfluous iterations, particularly in low accu¬ 
racy calculations, see Figure I4~T1 By numerical experiments we have found S = 0.01 to be 
an appropriate value. For values smaller than 0.01 the effect of the acceleration is less than 
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1 percent compared to the regular iteration, see Figure 021 



Figure 4.2: Efficiency of the accelerated SP2-ACC scheme relative to the regular SP2 scheme. 
The figure shows how much more an unoccupied eigenvalue A is reduced by the polynomial 
(1 — a + aA) 2 with a = 2/(2 — A) compared to the polynomial A 2 , in the relative sense, i.e. 
|A 2 — (1 — a + aA) 2 |/|A — A 2 |. The corresponding (identical) figure for an occupied eigenvalue 
can be constructed in the same way. Clearly the acceleration has almost no effect in the 
purification phase when all eigenvalues are close to their desired values of 0 or 1. The plot 
is in log-log scale. 


4.1 Estimation of the order of convergence 

As for the McWeeny expansion one may want to use the Frobenius norm instead of the 
spectral norm to measure the idempotency error. However, for the SP2 expansion, relation 
(12.171) translates to 

Kl 2 /Ki> 1, if Vi + Pi-i (4-11) 

given second order convergence and the application of Theorem Q] for compositions of alter¬ 
nating polynomials from two iterations. We have not been able to prove that (I4.11|) always 
holds. However, we have also not been able to fold any counterexample where (14.111) does 
not hold in exact arithmetics. 

We have encountered cases when (14.111) does not hold due to numerical errors. However, 
the use of the Frobenius norm has not resulted in too early stops in those cases. An example 
is given in Figure 14.31 where we applied the recursive expansion to a random symmetric 
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dense matrix. The occupied and unoccupied eigenvalues were distributed equidistantly in 
[0, 0.495] and [0.505, 1], respectively. The eigenvectors of the matrix were taken from a QR 
factorization of a matrix with random elements from a normal distribution. The use of the 
Frobenius norm in the stopping criterion results in a stop in iteration 29 while the use of 
the spectral norm results in a stop in iteration 31. However, being clear from panels (c) and 
(d), the stagnation phase started already in iteration 29. Although (14.111) does not hold, the 
use of the Frobenius norm does not result in a too early stop in this case. 

4.2 Number of subsequent iterations with the same polynomial 

The stopping criteria in Algorithms [3] and 0 include a condition of alternating polynomials, 
i.e. that Pi ^ in iteration i. When the polynomials in each iteration are chosen based 
on the trace of the matrix as in Algorithm [3l it is possible to construct examples where the 
same polynomial appears in an arbitrary number of subsequent iterations. In particular, you 
will get a large number of consecutive iterations with the same polynomial if there are many 
eigenvalues clustered very close to either the homo or the lumo eigenvalue. However, besides 
artificially constructed matrices we have not come across Fock or Kohn-Sham matrices that 
give more than a few subsequent iterations with the same polynomial. When the polynomials 
are chosen based on the location of the homo and lumo eigenvalues as in Algorithm U the 
number of subsequent iterations is strictly bounded. When Algorithm Q] is used without 
acceleration there can after an initial startup phase be at most two subsequent iterations 
with the same polynomial, as shown by the following theorem. 

Theorem 2. In Algorithm let /3 up ^ 0, 7 up 7 ^ 0, /3 ; 0 = 0, and 7 1 0 = 0. Assume that 
Pi ^ Pi—i ■ Then, if p i+ 1 = p t it follows that p i+2 ^ p i+ \. 

Proof. We use here the notation 


A := Aip and 7 * := 7 up 


(4.12) 
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(a) Observed and estimated orders of (b) Kf_ 2 /Ki 

convergence 



Iteration 

(c) Idempotency error using 
spectral norm 



Iteration 

(d) Idempotency error using 
Frobenius norm 


Figure 4.3: Example illustrating a special case when use of the Frobenius norm in the 
stopping criterion results in an earlier stop than use of the spectral norm. The regular SP2 
expansion is applied to a random symmetric dense 1000 x 1000 matrix with all eigenvalues in 
[0, 1], homo-lumo gap 0.01 located symmetrically around 0.5, and occupation number 500. 
In each iteration i the matrix X t is perturbed by a random symmetric matrix with elements 
from a normal distribution and with spectral norm 10” 5 . 


in every iteration i of the recursive expansion. Without loss of generality we consider the case 
i — 2. Assume that pi — 1. Then, /3 0 < y 0 and the largest possible number of subsequent 
iterations with Pk = 0, k > 2 is obtained with /3 0 = 7o- Then, following Algorithm Q] we 
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have that 


7i = To, (4-13) 

Pi = 2/3o-/3 0 2 = 2 7 o-7o 2 - (4-14) 

Since fii > 71 , p 2 = 0 and 

72 = 2 7 i - 7i = 2 7o ~ it (4.15) 

P 2 = Pi = (2 7 o - 7o) 2 = 27 o - 7o + 2 7o(7o - l) 2 - (4.16) 

Then, since 27^(70 — l ) 2 > 0, we have that 02 > 72 and p% = 0. Therefore 

7s = 272 -72= 2 ( 2 7o - 7o) - ( 2 7o - 7o) 2 = 4 7o “ 6 7o + 4 7o “ 7o> ( 4 - 17 ) 

03 = 02 = (270 - 7o) 4 

= 47o - 670 + 4 7o - 7o -4 7 2 (2 - H 70 + 167o ~ l°7o + 4 7o ~ 7o) • ( 4 -18) 

V -V-^---' 

73 <0 

Thus, 03 < 73 and JM — 1. The case with p 1 = 0 can be shown similarly. □ □ 

Note that Theorem [2] does not exclude the possibility of a large number of initial iterations 
with the same polynomial, ffowever, as soon as each of the two polynomials has been used at 
least once, there will not be more than two subsequent iterations with the same polynomial. 
For Algorithm [I] with acceleration, it is possible to show that there cannot be more than 
three subsequent iterations with the same polynomial. 

5 Numerical experiments 

This section provides numerical illustrations of the proposed stopping criteria showing that 
they work well for dense and sparse matrices regardless of what method is used to select 
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matrix elements for removal. All tests in this section are performed in Matlab R2015b. We 
will here use the regular SP2 expansion without acceleration. Similar results can be shown for 
expansions with other polynomials, such as McWeeny or accelerated SP2. It will be assumed 
that non-overlapping intervals containing the homo and lurno eigenvalues, respectively, are 
known before the start of the expansion so that Algorithm [I] can be used for selection of 
polynomials and Algorithm 0 for the expansion. To get a regular (nonaccelerated) expansion 
the parameters j3\ 0 and 7 i 0 are both set to 0 on lines [3] and O of Algorithm 0] giving scaling 
parameters c^, i — 1, 2,... on lines |TT] and E21 equal to 1 and n m i n equal to 2. 

In our first test we apply the SP2 expansion to a random symmetric dense matrix, see 
Figure EU The eigenvectors of the matrix were taken from a QR factorization of a matrix 
with random elements from a normal distribution. The occupied and unoccupied eigenvalues 
were distributed equidistantly in [0, 0.49] and [0.51, 1], respectively. In each iteration, the 
matrix Xi was perturbed by a random symmetric matrix with spectral norm r and elements 
from a normal distribution. The spectral norm was used to compute the observed order 
of convergence used in the stopping criterion. The figure shows that the stopping criterion 
accurately detects when numerical errors start to dominate the calculation and prevent any 
further improvement of the eigenvalues. 

We show in Figure I53Z1 that the stopping criteria work well for two different approaches 
to select matrix elements for removal in a linear scaling sparse matrix setting. The Fock 
matrix comes from a converged spin-restricted Hartree-Fock calculation for a linear alkane 
molecule C 160 H 322 using the standard Gaussian basis set STO-3G, giving a total of 1122 
basis functions and 641 occupied orbitals. In this case the eigenvalue problem (11.111 is on 
generalized form 

Fxi = A iSxi, (5.1) 

where S is the basis set overlap matrix. A congruence transformation employing the inverse 
Cholesky factor of S is used to get the eigenvalue problem on standard form as required by the 
algorithms described in this article. There are three different common approaches for removal 
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Figure 5.1: Idempotency error in each iteration of the SP2 expansion for a random symmetric 
dense 200 x 200 matrix with spectrum in [0, 1] and homo-lumo gap 0.02 located around 0.5 
and occupation number 100. In each iteration the matrix was perturbed by a random 
symmetric matrix with spectral norm r. The idempotency error is shown for 4 different 
values of r, before (blue circles and plus signs) and after (red triangles) the stop. 


of matrix elements. Each matrix element in a Fock or density matrix usually corresponds 
to the distance between two atomic nuclei. In cutoff radius based truncation, all elements 
corresponding to distances larger than a predefined cutoff radius are removed. However, in 
methods employing the congruence transformation this approach cannot be straightforwardly 
applied. In element magnitude based truncation, all elements with absolute value smaller 
than a predefined threshold value are removed. The threshold value is typically chosen based 
on practical experience without being directly linked to the actual error in the final result, 
the density matrix. However, a surprisingly simple relationship between the truncation and 
the error in the final result was developed by Rubensson et al .,- allowing us to control the 
forward error \\D — X n || 2 . The forward error is split in two parts, the error in eigenvalues 
(idempotency error) and the error in the occupied subspace. The subspace error is due to 
numerical errors and the eigenvalue error is due to numerical errors and a finite number of 
iterations. Figure I5.2al shows that our stopping criteria work well together with truncation 
based on matrix element magnitude. Figure l5.2bl shows that our stopping criteria work well 
when matrix elements are removed with control of the subspace error.- 





















(a) Truncation based on the 
magnitude relation 


(b) Truncation based on the 
spectral norm of the error matrix 


Figure 5.2: Demonstration of the stopping criterion with different ways of selecting matrix 
elements for removal in a sparse matrix setting. The idempotency error in each iteration of 
the SP2 expansion is shown for a Fock matrix coming from a Hartree-Fock calculation on a 
linear alkane using a standard Gaussian basis set. Panel (a): Magnitude based truncation 
with threshold value 8x for removing small matrix elements. Panel (b): Removal of elements 
with control of the error in the occupied subspace with the requirement that sin 6 < ex where 
6 is the largest canonical angle between the exact and perturbed subspaces. 


We will now investigate how the stopping criterion works when the observed order is 
estimated using the Frobenius and mixed matrix norms. We apply the SP2 expansion to 
a diagonal matrix of dimension 10 8 x 10 8 with occupation number 10 8 /2, homo-lumo gap 
0.1 located symmetrically around 0.5 and with otherwise equidistant eigenvalues in [0, 1]. 
The idempotency errors in each iteration measured by the Frobenius, mixed, and spectral 
norms are shown in Figure 1531 The block size for the mixed norm is 1000. As anticipated in 
Section[21 ||W — Xf\\ F never goes below 1 for such a large system and (12.191) cannot be used 
to estimate the observed order. If the spectral norm is expensive to compute, the mixed 
norm may be used in such cases. 


6 Application to self-consistent field calculations 

In this section we use the developed stopping criteria in the regular and accelerated SP2 
expansions in self-consistent field (SCF) calculations with the quantum chemistry program 
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Figure 5.3: Frobenius, mixed, and spectral norms of Xi — Xf in every iteration i of the 
recursive SP2 expansion applied to a diagonal 10 s x 10 8 matrix with occupation number 
10 s /2, homo-lumo gap 0.1 located symmetrically around 0.5, and otherwise equidistantly 
distributed eigenvalues in [0, 1]. In each iteration the matrix was perturbed by a diagonal 
matrix with random elements from a normal distribution and with spectral norm 10~ 3 . The 
block size for the mixed norm is 1000. 

Ergo . 16,17 We have performed spin-restricted Hartree-Fock calculations on a cluster of 4158 
water molecules using the standard Gaussian basis set 3-21G, giving a total of 54054 basis 
functions. As initial guess we used the result of a calculation with a smaller basis set, STO- 
3G. We used direct inversion in the iterative subspace (DIIS) for convergence acceleration 3135 
and stopped the iterations as soon as the largest absolute element of FDS—SDF was smaller 
than 10~ 3 . The hierarchical matrix library — was used for sparse matrix operations. A block 
size of 32 was used at the lowest level in the sparse hierarchical representation. The mixed 
norm with block size 32 was used both in the stopping criterion and for removal of small 
matrix elements^ with a tolerance of 10~ 3 for the error in the occupied subspace measured 
by the largest canonical angle between the exact and approximate subspaces.- 

The tests were running on the Tintin cluster at the UPPMAX computer center in Uppsala 
University using the gcc 5.3.0 compiler and the OpenBLAS 3 ' library was used for matrix 
operations at the lowest level in the sparse hierarchical representation. Each node on Tintin 
is a dual AMD Bulldozer compute server with two 8-core Opteron 6220 processors running at 
3.0 GHz. The calculations presented here are performed on a node with 128 GB of memory. 
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In each SCF cycle we compute upper and lower bounds of the homo and lurno eigenvalues 
and propagate them to the next SCF cycle. It was shown by Rubensson and Zahedi— 
that eigenvalues around the homo-lumo gap can be computed efficiently by making use of 
the ability of the recursive expansion to give large separation between interior eigenvalues. 
In that work, the Lanczos method was used to extract the desired information. Here we 
use a recent approach to compute accurate homo and lumo bounds that only requires the 
evaluation of Frobenius norms and traces during the course of the recursive expansion.— 
Intervals containing the homo and lumo eigenvalues are propagated between SCF cycles 
using Weyl’s theorem for eigenvalue movement.- 1 ^ The inner bounds for the homo and lumo 
eigenvalues are used both for the determination of polynomials in Algorithm Q] and for the 
error control. The outer bounds are used for the acceleration in Algorithm SJ When inner 
bounds for homo and lumo are not known or are not accurate we fall back to the trace- 
correcting SP2 expansion described in Algorithm El However, even if the outer bounds are 
loose Algorithm [4] can be used with a modification that the polynomials are determined on 
the fly using the condition in Line [7] in Algorithm El 

The regular and accelerated SP2 expansions are compared in Figure 16.11 Panels (a) 
and (b) show the number of iterations and wall time, respectively, for the regular and accel¬ 
erated expansions in each SCF cycle. Panel (c) shows the percentage of non-zero elements 
in the Fock and density matrices in orthogonal basis in each SCF cycle. Matrices were 
transformed from non-orthogonal basis using the inverse Cholesky factor as in the previous 
section. The number of non-zeros and the idempotency error for the regular and accelerated 
SP2 expansions in the last SCF cycle are shown in Figure IfCZl In this case, the accelerated 
expansion takes a shorter but less sparse route to the final density matrix. Thus, the peak 
memory usage is larger and in some iterations significantly more work is required. However, 
the acceleration gives a substantial reduction of the overall computational time needed for 
the recursive expansion. 

Figure [673] shows for each SCF cycle intervals containing the homo and lumo eigenvalues 
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Figure 6.1: Comparison of the regular and accelerated SP2 expansions in a self-consistent 
field Hartree-Fock calculation for a water cluster with 4158 water molecules. Panels (a) 
and (b) show the number of iterations and wall time of the recursive expansion in each self- 
consistent field cycle. Panel (c) gives the percentage of non-zero elements in the Fock and 
density matrices, F and D respectively, in orthogonal basis. See the text for more details. 


propagated from the previous SCF cycle (initial homo/lumo) and the improved intervals 
computed using information extracted from the recursive expansion (estimated homo/lumo). 

Table |T] shows upper and lower bounds n max and n min , respectively, and actual number of 
iterations n in the recursive expansion in each SCF cycle. In the two initial SCF cycles the 
homo and lurno intervals are overlapping, see Figure I(P1 and therefore we cannot use Algo¬ 
rithm [I] to determine the sequence of polynomials. Thus, an upper bound of the number of 
iterations n max based on the homo and lumo intervals cannot be computed. The acceleration 
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(b) Idempotency error measured using the 
mixed norm 


Figure 6.2: Comparison of the regular and accelerated SP2 expansions in the last cycle in a 
self-consistent held Hartree-Fock calculation for a water cluster with 4158 water molecules. 


—*— initial homo —©— initial lumo -v- estimated homo -A- estimated lumo 



Figure 6.3: For each cycle in a self-consistent held Hartree-Fock calculation for a water 
cluster with 4158 water molecules with the SP2-ACC expansion, intervals containing the 
homo and lumo eigenvalues propagated from the previous SCF cycle (initial) and computed 
as a by-product of the recursive expansion (estimated). The corresponding hgure for regular 
SP2 is essentially identical. 


has an effect as soon as the outer bounds for homo and lumo are better than the extremal 
bounds (A m i n and A max ), in our case already in the second SCF cycle. However, in the initial 
SCF cycles the outer bounds are loose, making the acceleration less effective than in later 
iterations. Starting from the third SCF cycle the upper and lower bounds on the number of 
iterations are given by Algorithm [U In iteration n min the acceleration has been turned oh, as 
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discussed in Section [3 Note that the proposed stopping criteria are used in each SCF cycle 
independently of the method for choosing polynomials. The recursive expansion is stopped 
as soon as the estimated order of convergence computed using the mixed norm is smaller 
than q = 1.8. 

Table 1: Upper estimate n max , actual number of iterations n in recursive expansion in each 
SCF cycle and the iteration n m ; n where the acceleration has been turned off. There is no 
acceleration in the first SCF cycle, and at least two iterations should be performed to be 
able to check the stopping criterion. 


SCF cycle 

1 

2 

3 

4 

5 

6 

Mmax 

- 

- 

26 

24 

24 

23 

n 

29 

20 

19 

17 

18 

17 

^min 

2 

14 

14 

14 

14 

14 


The proposed stopping criteria, the acceleration technique and efficient estimation of 
the homo and lumo eigenvalues give a significant performance improvement of the recursive 
density matrix expansion. We show with the water cluster example that the use of the 
SP2-ACC polynomials reduces the number of matrix-matrix multiplications in comparison 
to the regular SP2 scheme and the proposed stopping criteria enable us to reach the level 
of attainable accuracy without spending redundant computational effort in the stagnation 
phase. 

The length of the conditioning phase depends on the homo-lumo gap. Smaller horno- 
lumo gap will give a longer phase. The acceleration technique reduces the length of the 
conditioning phase, but has no significant impact in the purification phase. Our stopping 
criteria are designed to automatically detect when numerical errors start to dominate. By 
introducing a parameter C q satisfying (12.41) we eliminate the need to determine an iteration 
after which one can check the stopping criterion. The proposed stopping criteria can be 
checked already in the first iteration. Their efficiency depends on the closeness of the chosen 
value C q to the smallest value satisfying (12.4ft . but does not depend on the length of phases 
in the recursive expansion, and is thus independent of the homo-lumo gap. 
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7 Discussion 


The stopping criterion is an important aspect of the development process of any iterative 
method. Many works address this question for linear systems^ - — and eigensolvers.— 

In general, such stopping criteria are based on controlling the norm of a suitable residual 
or estimate of the norm of the error. The convergence path can be irregular and contain 
stagnation regions. 

Stopping criteria for iterative methods for matrix functions are often based on the rel¬ 
ative distance of the subsequent iterates Xf,. and Xk+\.— As soon as the distance becomes 
smaller than some predefined threshold value the iterations stop. However, in the presence 
of numerical errors it is hard to find an optimal threshold value. To illustrate that our 
approach to develop stopping criteria for iterative methods is applicable also to other ma¬ 
trix iterations, we derive a stopping criterion of the same type for the Newton sign matrix 
iteration in Appendix lAl 

Newton’s method to find roots of real-valued functions is locally at least quadratically 
convergent for simple roots. Also in this case, the iterations are typically stopped when some 
error measure goes below a predefined threshold value. Often one wants to continue iterating 
until numerical errors (from e.g. floating point roundoff) prevent any further decrease of the 
error. The threshold value is often chosen in terms of the expected accuracy of the evaluation 
of the function, e.g. some multiple of machine epsilon. To avoid the selection of threshold 
value one may, following the lines of the present work, analyze the convergence behavior 
and devise a stopping criterion based on the detection of a fall in convergence order, see 
Appendix [B] This is in particular useful in cases when the accuracy in the evaluation of a 
function is not known or depends non-trivially on program input parameters. 
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8 Concluding remarks 


Recursive expansions to compute the density matrix in electronic structure calculations are 
usually stopped when the idempotency error goes below some predefined tolerance. The main 
problem with such an approach is that an appropriate value for the tolerance is difficult to 
select. If the tolerance is small in relation to numerical errors coming from removal of matrix 
elements or rounding errors, the iterations will never stop. If the tolerance is large in relation 
to numerical errors, the same accuracy could be achieved with less effort. 

In previous work we addressed the never stop issue by tightening the tolerance for removal 
of small matrix elements at the end of the expansion until the desired accuracy is achieved for 
eigenvalues.- However, this approach also involves a user defined parameter for the accuracy 
in eigenvalues with potential impact on convergence and computational cost. 

The practical usefulness of our new stopping criteria proposed here can be seen in the 
context of the ERGO— ^ program where the new stopping criterion allows the number of 
input parameters to the program to be reduced, since only a single parameter for the density 
matrix construction accuracy is now needed. The new stopping criterion also solves previous 
problems with failed convergence when using default Ergo parameters; previously, when 
using the default parameters the recursive expansion failed due to rounding errors in some 
cases, particularly for larger molecules where the effect of rounding errors is more pronounced. 

If the homo and lumo eigenvalues are known in advance one may compute an upper bound 
for the number of iterations in advance by iterating until the homo and lumo eigenvalues 
are within rounding error from their desired values. The number of iterations in such an 
approach corresponds to n max in the present work. For high accuracy calculations this may 
be a reasonable approach but often it would lead to superfluous iterations, as for example 
can be seen for the water cluster calculations in Section [HI 

In the present work three phases of the recursive expansion were identified: condition¬ 
ing, purification, and stagnation. The appropriate moment to stop the expansion is at the 
transition between purification and stagnation. At this transition there is a drop in the 
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order of convergence. By detection of this drop we are able to stop the expansion at the 
appropriate moment without any user defined parameters. The transition to stagnation is 
accurately detected even if the idempotency error continues to slowly decrease. By altering 
the asymptotic error constant in the observed order of convergence we avoid an early stop 
in the conditioning phase. 
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A Sign matrix iterations 

Applying Newton’s method to the function f(X) = X 2 — I gives an iteration 



(A.l) 


for the sign matrix function which converges quadratically provided that A G C nxn has no 
eigenvalues on the imaginary axis. 6 


Since the matrices X k and X k+1 — X k commute, the Taylor expansion of the matrix 
function f(X k+1 ) = X 2 +l - I is given bj4L4& 


f(X k+1 ) = f(X k ) + f\X k ){X k+ 1 - X k ) + R 2 (X k ) 

= /( X k ) + /W(-(/W)"7M) + MXk) 


(A.2) 


(A.3) 


M x k ), 


(A.4) 
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where the truncation error for the spectral norm— is bounded 


R 2 (X k )|| 2 < l\\X k+1 - X k \\l max || f"(X k + s(X k+1 - X k ))\\ 2 (A.5) 

Z se [ 0 , 1 ] 

< h/(W)ll 2 ll(/'(W))- 1 |l 2 max || f"(X k + a(X l+1 - W))|| 2 (A.6) 

Z sG[0,l] 

= jl|X k - 1 Hl||/(X i )||', (A.7) 


where we have used that /'( X k ) = 2X k and f"(X k ) = 21. 


Thus in exact arithmetics 


II/(-Afc+i)112 < j|IW 1 |llll/(W)|li, 


(A.8) 


which suggests to stop the iterations as soon as 


\\f(X k )\\ 2 <l and 


log(||/(X fc+1 )|l 2 /g fc ) 

log(||/(X fc )|| 2 ) - • ’ 


(A.9) 


where 


Ck > jIIWII’. (A.10) 

The value of C k should be chosen the smallest possible, see the discussion in section [2] 

If in addition we assume that the matrix A is normal, then for k > 1 the absolute values 
of the eigenvalues of X k are bounded from below by 1 and 11AT^T 1 111 < 1. Thus for all k > 1 
we define 


c k -.= c = 


1 

4" 


(A.ll) 
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B Stopping criteria for Newton’s method 


Let f(x) be a twice continuously differentiable function, f(x*) = 
Newton’s method 


x k+ i =x k + Ax k , Ax k 


/Qfc) 

f'(x k ) 


0, and f\x*) ^ 0. Then, 

(B.l) 


is locally quadratically convergent to x*. 

Assume that f'(x k ) ^ 0. Taylor expansion around x k with step Ax k and using (IB.ID and 
Lagrange’s form of the remainder gives 


< R2 > 

where £ k is some value between x k and x k +i- Following the ideas of the present article, this 
suggests the stopping criterion: stop as soon as 


|/(x fc )|<l and 


Iog|/(sfc+i)/Cfc 

log|/(®fc)l 


(B.3) 


where 


C k 


F k 

2 wMr 


F k> max |/"(0I- 

4 between x^ and Xk+i 


(B.4) 


We note that if f(x k ) comes close to zero C k becomes very large and the stopping criterion 
is not triggered. Thus, there is no need for special treatment in such cases. 
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